Search CORE

23 research outputs found

Distributed Triangle Counting in the Graphulo Matrix Math Library

Author: Hutchison Dylan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2017
Field of study

Triangle counting is a key algorithm for large graph analysis. The Graphulo library provides a framework for implementing graph algorithms on the Apache Accumulo distributed database. In this work we adapt two algorithms for counting triangles, one that uses the adjacency matrix and another that also uses the incidence matrix, to the Graphulo library for server-side processing inside Accumulo. Cloud-based experiments show a similar performance profile for these different approaches on the family of power law Graph500 graphs, for which data skew increasingly bottlenecks. These results motivate the design of skew-aware hybrid algorithms that we propose for future work.Comment: Honorable mention in the 2017 IEEE HPEC's Graph Challeng

arXiv.org e-Print Archive

Crossref

Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database

Author: Fuchs Adam
Gadepally Vijay
Hutchison Dylan
Kepner Jeremy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2015
Field of study

The Apache Accumulo database excels at distributed storage and indexing and is ideally suited for storing graph data. Many big data analytics compute on graph data and persist their results back to the database. These graph calculations are often best performed inside the database server. The GraphBLAS standard provides a compact and efficient basis for a wide range of graph applications through a small number of sparse matrix operations. In this article, we implement GraphBLAS sparse matrix multiplication server-side by leveraging Accumulo's native, high-performance iterators. We compare the mathematics and performance of inner and outer product implementations, and show how an outer product implementation achieves optimal performance near Accumulo's peak write rate. We offer our work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulo.Comment: To be presented at IEEE HPEC 2015: http://www.ieee-hpec.org

arXiv.org e-Print Archive

Crossref

D4M 3.0: Extended Database and Language Capabilities

Author: Chen Alexander
Gadepally Vijay
Hutchison Dylan
Kepner Jeremy
Milechin Lauren
Samsi Siddharth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/08/2017
Field of study

The D4M tool was developed to address many of today's data needs. This tool is used by hundreds of researchers to perform complex analytics on unstructured data. Over the past few years, the D4M toolbox has evolved to support connectivity with a variety of new database engines, including SciDB. D4M-Graphulo provides the ability to do graph analytics in the Apache Accumulo database. Finally, an implementation using the Julia programming language is also now available. In this article, we describe some of our latest additions to the D4M toolbox and our upcoming D4M 3.0 release. We show through benchmarking and scaling results that we can achieve fast SciDB ingest using the D4M-SciDB connector, that using Graphulo can enable graph algorithms on scales that can be memory limited, and that the Julia implementation of D4M achieves comparable performance or exceeds that of the existing MATLAB(R) implementation.Comment: IEEE HPEC 201

arXiv.org e-Print Archive

Crossref

Polystore mathematics of relational algebra

Author: Gadepally Vijay
Hutchison Dylan
Jananthan Hayden
Kepner Jeremy
Kim Suna
Zhou Ziqi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2017
Field of study

Financial transactions, internet search, and data analysis are all placing increasing demands on databases. SQL, NoSQL, and NewSQL databases have been developed to meet these demands and each offers unique benefits. SQL, NoSQL, and NewSQL databases also rely on different underlying mathematical models. Polystores seek to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Integrating the underlying mathematics of these diverse databases can be an important enabler for polystores as it enables effective reasoning across different databases. Associative arrays provide a common approach for the mathematics of polystores by encompassing the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). Prior work presented the SQL relational model in terms of associative arrays and identified key mathematical properties that are preserved within SQL. This work provides the rigorous mathematical definitions, lemmas, and theorems underlying these properties. Specifically, SQL Relational Algebra deals primarily with relations - multisets of tuples - and operations on and between those relations. These relations can be modeled as associative arrays by treating tuples as non-zero rows in an array. Operations in relational algebra are built as compositions of standard operations on associative arrays which mirror their matrix counterparts. These constructions provide insight into how relational algebra can be handled via array operations. As an example application, the composition of two projection operations is shown to also be a projection, and the projection of a union is shown to be equal to the union of the projections

arXiv.org e-Print Archive

Caltech Authors

Julia implementation of the Dynamic Distributed Dimensional Data Model

Author: Chen Alexander Y.
Edelman Alan
Gadepally Vijay N.
Hutchison Dylan D.
Kepner Jeremy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/05/2018
Field of study

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation

DSpace@MIT

Polystore mathematics of relational algebra

Author: Gadepally Vijay
Hutchison Dylan
Jananthan Hayden
Kepner Jeremy
Kim Suna
Zhou Ziqi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2017
Field of study

Recommended from our members

Toxicity of Arsenic in Iron King Mine PM₁₀ Tailings is Mitigated by Synthetic Alveolar Lung Fluid

Author: Hutchison Dylan Michael
Hutchison Dylan Michael
Publication venue: The University of Arizona.
Publication date: 01/01/2016
Field of study

This paper provides a risk assessment of pertinent toxic contaminants in the tailings of the Iron King Mine using a model of aeolian transport fated in human alveolar lung. Here, we studied particulate matter of tailings that are 10 microns () or less in diameter (₁₀) because these is most hazardous fraction. We used in-vitro bioaccessibility and in-vivo Microtox® data to determine the relationships between chronic inhalation of these tailings. Our data suggest that arsenic and zinc are the two principle drivers for toxicity of the Iron King Mine’s PM₁₀ tailings and that arsenic will solubilize in human alveolar biofluids at the expense of other noteworthy elemental contaminants in the tailings. The principle contaminant of concern for chronic exposure is arsenic, due to its increased bioaccessibility over time. Our data show that synthetic lung fluid (SLF) mitigates the toxic effects of arsenic, despite its increase in bioaccessibility over time. Therefore, we suggest a buffering mechanism of phosphate competition with arsenate to explain this mitigation of toxicity in SLF. We conclude that public health risk of chronic inhalation of IKM PM₁₀ tailings may be less severe than would otherwise be suggested by high concentrations of toxic contamination in the tailings impoundment

The University of Arizona